AITopics | pl condition

On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms Anonymous Author(s) Affiliation Address email

Neural Information Processing SystemsApr-30-2026, 05:48:15 GMT

Stochastic gradient descent (SGD) algorithm is the method of choice in many1 machine learning tasks thanks to its scalability and efficiency in dealing with2 large-scale problems. In this paper, we focus on the shuffling version of SGD3 which matches the mainstream practical heuristics. We show the convergence4 to a global solution of shuffling SGD for a class of non-convex functions un-5 der over-parameterized settings. Our analysis employs more relaxed non-convex6 assumptions than previous literature. Nevertheless, we maintain the desired compu-7 tational complexity as shuffling SGD has achieved in the general convex setting.8 1 Introduction9 In the last decade, neural network-based models have shown great success in many machine learning10 applications such as natural language processing [Collobert and Weston, 2008, Goldberg et al., 2018],11 computer vision and pattern recognition [Goodfellow et al., 2014, He and Sun, 2015].

artificial intelligence, convergence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
North America > Canada (0.28)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

An Online Method for AClass of Distributionally Robust Optimization with Non-Convex Objectives

Neural Information Processing SystemsApr-25-2026, 22:33:30 GMT

In this paper, we propose a practical online method for solving a class of distributionally robust optimization (DRO) with non-convex objectives, which has important applications in machine learning for improving the robustness of neural networks. In the literature, most methods for solving DRO are based on stochastic primal-dual methods. However, primal-dual methods for DRO suffer from several drawbacks: (1) manipulating a high-dimensional dual variable corresponding to the size of data is time expensive; (2) they are not friendly to online learning where data is coming sequentially. To address these issues, we consider a class of DRO with an KL divergence regularization on the dual variables, transform the minmax problem into a compositional minimization problem, and propose practical duality-free online stochastic methods without requiring a large mini-batch size. We establish the state-of-the-art complexities of the proposed methods with and without a Polyak-Łojasiewicz (PL) condition of the objective. Empirical studies on large-scale deep learning tasks (i) demonstrate that our method can speed up the training by more than 2 times than baseline methods and save days of training time on a large-scale dataset with 265K images, and (ii) verify the supreme performance of DRO over Empirical Risk Minimization (ERM) on imbalanced datasets. Of independent interest, the proposed method can be also used for solving a family of stochastic compositional problems with state-of-the-art complexities.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Iowa (0.28)

Genre: Research Report (0.46)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

533fa796b43291fc61a9e812a50c3fb6-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 22:33:28 GMT

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Iowa (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture

Neural Information Processing SystemsApr-25-2026, 02:24:01 GMT

In this paper we show that feedforward neural networks corresponding to arbitrary directed acyclic graphs undergo transition to linearity as their "width" approaches infinity. The width of these general networks is characterized by the minimum indegree of their neurons, except for the input and first layers. Our results identify the mathematical structure underlying transition to linearity and generalize a number of recent works aimed at characterizing transition to linearity or constancy of the Neural Tangent Kernel for standard architectures.

artificial intelligence, machine learning, neural network, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms Anonymous Author(s) Affiliation Address email

Neural Information Processing SystemsFeb-17-2026, 20:41:10 GMT

While there has been much attention on the theoretical aspect of the traditional i.i.d.

artificial intelligence, convergence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(6 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms Lam M. Nguyen

Neural Information Processing SystemsFeb-17-2026, 20:41:06 GMT

Stochastic gradient descent (SGD) algorithm is the method of choice in many machine learning tasks thanks to its scalability and efficiency in dealing with large-scale problems. In this paper, we focus on the shuffling version of SGD which matches the mainstream practical heuristics. We show the convergence to a global solution of shuffling SGD for a class of non-convex functions under over-parameterized settings.

artificial intelligence, convergence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(7 more...)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

c981fd12b1d5703f19bd8289da9fc996-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 02:19:48 GMT

artificial intelligence, machine learning, optimization, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Maryland > Baltimore (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(10 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

A Generalized Alternating Method for Bilevel

Neural Information Processing SystemsFeb-17-2026, 02:19:44 GMT

Bilevel optimization has recently regained interest owing to its applications in emerging machine learning fields such as hyperparameter optimization, meta-learning, and reinforcement learning. Recent results have shown that simple alternating (implicit) gradient-based algorithms can match the convergence rate of single-level gradient descent (GD) when addressing bilevel problems with a strongly convex lower-level objective. However, it remains unclear whether this result can be generalized to bilevel problems beyond this basic setting.

artificial intelligence, machine learning, optimization, (16 more...)

Neural Information Processing Systems

Country: